Performance Evaluation of Grid Based Multi-Attibute Record Declustering Methods
نویسندگان
چکیده
I/O subsystem is widely accepted as one of the principal bottlenecks for high performance parallel databases systems. The emergence of parallel I/O architectures has made the problem of data declustering, i.e. fragmenting a le of records and allocating the pieces to diierent disks, one of prime importance. This is evident from the growing activity in this area. In this study we focus only on multi-attribute declustering methods which are based on some type of grid-based partitioning of the data space. While a number of such declustering methods exist, we believe a good performance evaluation of their relative merits is lacking. Almost all performance analyses so far have been theoretical, where exact conditions on number of disks, sizes of attribute domains, and query shapes and sizes have been derived, for which a certain declustering method is optimal. Also, most conditions exist for partial match queries. We believe that in practice putting restrictions on the size of attribute domains is debatable and on the shape and size of queries is unacceptable. Thus, to answer the question how do various declustering schemes perform under a wide range of query and database scenarios (both relative to each other and to the optimal)?, we have carried out a detailed performance evaluation. Parameters that are varied include shape and size of queries, database size and number of attributes, and the number of disks. The theoretical contribution of this paper is in showing that there exists no declustering methods that is strictly optimal for range queries if the number of disks is more than 5. Our ndings are: (i) for large queries all methods perform almost the same and are close to optimal, (ii) there can be a substantial diierence for small queries, (iii) performance of the methods is quite sensitive to query shape, and (iv) the relative diierence between the methods' performance as well as their deviation from optimality decreases with the size and number of attributes in a query. Thus, we conclude that information about common queries on a relation ought to be used in deciding the declustering for it, and that this is especially crucial for small queries. Also, since there is no clear winner, parallel database systems must support a number of declustering methods.
منابع مشابه
Dynamic Declustering Methods for Parallel Grid Files
Several declustering functions for distributing multi-attribute data on a set of disks have been proposed in recent years. Since these functions map grid regions to disks in a static way, performance deteriorates in case of dynamic datasets and/or non-stationary data distributions. In this paper we first analyze how declustering functions can be extended in order to deal with dynamic datasets w...
متن کاملMulti-Site Declustering Strategies for Very High Database Service Availability
The thesis introduces the concept of multi-site declustering strategies with self repair for databases demanding very high service availability. Existing work on declustering strategies are centered around providing high performance and reliability inside a small geographical area (site). Applications demanding robustness against site failures like fire and power outages, can not use these meth...
متن کاملLatin Hypercubes: A Class of Multidimensional Declustering Techniques
The I/O subsystem is widely accepted as one of the principal bottlenecks for high performance parallel databases systems. The emergence of parallel I/O architectures has made the problem of data declustering, i.e. fragmenting a le of records and allocating the pieces to different disks, one of prime importance. This is evident from the growing activity in this area. In this study we focus only ...
متن کاملStudy of Scalable Declustering Algorithms for Parallel Grid Files
Efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known acce...
متن کاملA Hierarchical Technique for Constructing Efficient Declustering Schemes for Range Queries
Multi-disk systems, coupled with declustering schemes, have been widely used in various applications to improve I/O performance by enabling parallel disk accesses. A declustering scheme determines how data blocks should be placed among multiple disks to maximize the parallelism. We focus on the problem of declustering grid-structured multidimensional data with the objective of reducing the resp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994